Retrieving Lexical Semantics from Multilingual Corpora

نویسندگان

  • Ahmad R. Shahid
  • Dimitar Kazakov
چکیده

This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual ‘synsets’. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach can be measured. The reported results are for English, German, French, and Greek using the Europarl parallel corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling

issue of Polibits includes a selection of papers related to the topic of processing of semantic information. Processing of semantic information involves usage of methods and technologies that help machines to understand the meaning of information. These methods automatically perform analysis, extraction, generation, interpretation, and annotation of information contained on the Web, corpus, nat...

متن کامل

Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models

Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a crosslingual approach that constructs a synta...

متن کامل

Standards & best practice for multilingual computational lexicons: ISLE MILE and more

ISLE (International Standards for Language Engineering) is a transatlantic standards oriented initiative under the Human Language Technology (HLT) programme within the EU-US International Research Co-operation. It is a continuation of the European EAGLES (Expert Advisory Group for Language Engineering Standards) initiative, carried out through a number of subsequent projects funded by the Europ...

متن کامل

Unsupervised Construction of a Multilingual WordNet from Parallel Corpora

This paper outlines an approach to the unsupervised construction from unannotated parallel corpora of a lexical semantic resource akin to WordNet. The paper also describes how this resource can be used to add lexical semantic tags to the text corpus at hand. Finally, we discuss the possibility to add some of the predicates typical for WordNet to its automatically constructed multilingual versio...

متن کامل

Predicting Lexical Relations between Biomedical Terms: towards a Multilingual Morphosemantics-based System

This paper addresses the issue of how semantic information can be automatically assigned to compound terms, i.e. both a definition and a set of semantic relations. This issue is particularly crucial when elaborating multilingual databases and when developing cross-language information retrieval systems. The paper shows how morpho-semantics can contribute in the constitution of multilingual lexi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Polibits

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2010